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Abstract 



Allocation of balls into bins is a well studied abstraction for load balancing problems. The literature hosts nu- 
merous results for sequential (single dimensional) allocation case when m balls are thrown into n bins; such as: for 
multiple choice paradigm the expected gap between the heaviest bin and the average load is O ( ) lH , ( 1 + /9) 

choice paradigm with 0{ '"^^"^ ) gap |10| as well as for single choice paradigm having 0{\J ^^ ^^''"^ ) gap |9|. How- 
ever, for multidimensional balanced allocations very little is known. Mitzenmacher |6| proved 0(log log(ni3)) gap 
for the multiple choice strategy and 0{\og{nD)) gap for single choice paradigm (where D is the total number of 
dimensions with each ball having exactly / populated dimensions) under the assumption that for each ball / dimen- 
sions are uniformly distributed over the D dimensions. In this paper we study the symmetric multiple choice process 
for both unweighted and weighted balls as well as for both multidimensional and scalar modes. Additionally, we 
present the results on bounds on gap for the (1 + /?) choice process with multidimensional balls and bins. 

In the first part of this paper, we study multidimensional balanced allocations for the symmetric d choice process 
with m >> n unweighted balls and n bins. We show that for the symmetric d choice process and with m = 0{n), 
the upper bound (assuming uniform distribution of / populated dimensions over D total dimensions) on the gap is 
0(ln ln(n)) w.h.p.. This upper bound on the gap is within D / f factor of the lower bound. This is the first such tight 
result along with detailed analysis for d choice paradigm with multidimensional balls and bins. This improves upon 
the best known prior bound of 0(log log(nD)) t6J. For the general case of m >> n the expected gap is bounded by 
0(lnln(n)). For variable / and non-uniform distribution of the populated dimensions (using analysis for weighted 
balls), we obtain the upper bound on the expected gap as 0(log(n)). 

Further, for the multiple round parallel balls and bins, using symmetric d-choice process in multidimensional 
mode, we show that the gap is also bounded by 0(loglog(n)) for m = 0{n). The same bound holds for the 
expected gap when m » n. 

Our analysis also has the following strong implications for the sequential scalar case. For the weighted balls 
and bins and general case m » n, we show that the upper bound on the expected gap is 0{\og{n)) (assuming 
-E[M^] = 1 and second moment of the weight distribution is finite) which improves upon the best prior bound of n'^ 
(c depends on the weight distribution that has finite fourth moment) provided in 1 12|. Our analysis also provides a 
much easier and elegant proof technique (as compared to |4 1) for the 0(log log(n)) upper bound on the gap for scalar 
unweighted m » n balls thrown into n bins using the symmetric multiple choice process. 

Moreover, we study multidimensional balanced allocations for the (1 + /3) choice process and the multiple (d) 
choice process. We show that for the (1 + /3) choice process and m = 0(n) the upper bound (assuming uniform 
distribution of / populated dimensions over D total dimensions) on the gap is 0(^2^^), which is within D/f factor 
of the lower bound. For fixed / with non-uniform distribution and for random / with Binomial distribution the 
expected gap remains 0{ '"^^"-^ ) and is independent of the total number of balls thrown, m. This is the first such tight 
result along with detailed analysis for (1 + /3) paradigm with multidimensional balls and bins. 



1 Introduction 

Balls-into-bins processes serve as a useful abstraction for resource balancing tasks in distributed and parallel systems. 
Assume m balls are to be put sequentially into n bins, where typically the goal is to minimize the load, measured by 
the number of balls, in the most loaded bin. In the classic single choice process each ball is placed in a bin chosen 
independently and uniformly at random. For the case of n bins and m — n balls it is well known that the load of the 
heaviest bin is at most (1 + in"i"^) balls with high probability (w.h.p.). Further, if m > nln(n) then the load in 

the heaviest bin is given by at most ^ + \J ( ™'^°^^v l^ ifTTIl . 

An interesting and substantial decrease in the maximum load is achieved by the use of the multiple choice paradigm 
(also referred to as d choice paradigm), given as: Let Greedy{U, d) denote the algorithm where each ball is inserted 
into the lesser loaded among the d > 2 bins, independently sampled from U, where U denotes the uniform distribution 
over the bins. In a seminal paper Azar et. al. |i3j| proved that when m — n and the balls are inserted by Greedy{U, d) 
the heaviest bin has load of + 6(1) w.h.p.. The case for d = 2 was proved by Karp et.al. in IS), later being 

generalized by Berenbink et.al. ||4l to prove the following: 

Theorem 1.1 Let 7 denote a suitable constant. If m balls are allocated into n bins using Greedy{U, d) with d > 2 
then the number of bins with load at least ^ + i + "f is at most n.expl—d^) with probability at least 1 — 1 /n. ( [4 ]) 

An immediate corollary is that w.h.p. the heaviest bin has a load of ^ + '°fo'°(^^"'' + 0{1). Thus, the additive gap 
between the maximum load and the average load is independent of the number of balls thrown. 

The multiple-choice paradigm and balls-and-bins models have several interesting applications. In paiticular, the 
two-choice paradigm can be used to reduce the maximum time required to search a hash table. If instead of using 
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a single perfectly random hash function as in a typical hash table implementation (with maximum chain length as 
0(ln(n))), we use two perfectly random hash functions, then the length of the longest chain reduces to 0(lnln(ri)). 
In efficient PRAM simulation, the two-choice paradigm helps in reducing the contention ( |2|) of processors to access 
the same memory (DRAM). Further, the two-choice scheme can be advantageous in situations, for example when one 
hopes to fit one full chain in a single cache line The multiple-choice approach has also proven useful in online 
(dynamic) assignment of tasks to servers (disk servers, network servers etc). By using multiple-choice one would get 
much better load balance across the servers as compared to the single-choice approach. 

In many practical problems, the underlying data can be multidimensional. This is especially true for parallel data 
mining and machine learning problems, where the input data has many dimensions such as text search where the 
distinct words in the document set can be considered as the dimensions and the total number of dimensions equals the 
size of the vocabulary that could potentially run into millions of words. Because the collection of pages to be indexed 
is so large, it has to be split among n servers. When a user makes a query to a front-end machine, the query is sent to 
all n servers; results are returned to the front-end machine for merging and presentation. Hence, the time to serve the 
query is determined by the slowest of the servers, the critical process. The time for each server to process a one-word 
query is roughly proportional to the number of documents at that server containing the word of interest. Thus, to 
achieve better efficiency, it is necessary to efficiently split the documents among servers in such way that the number 
of documents containing a given word is roughly equal. 

Further, many application domains such as Telecommunication, Finance and others also involve huge number of 
dimensions such as genres and sub genres of songs and videos for collaborative filtering type correlational analysis 
between the users. Here, one would like to predict what type of item (song or video) one user could prefer based on his 
inferred relationship with other similar users. Due to massive size of the multidimensional data in such distributed data 
mining and machine learning problems, one needs to devise online load balancing algorithms. While, dimensionality 
reduction techniques can reduce the total number of dimensions to work on, even then, one needs to handle data with 
large number of dimensions. Further, this data is highly sparse, i.e. number of filled entries in the (user * item) matrix 
is a small fraction of total possible entries in the matrix. Thus, distributed data mining and machine learning (for 
example in cloud computing environments), suffer from severe scalability and parallel efficiency issues due to huge 
load imbalance across the machines in the compute cluster (cloud). Hence, there is a strong need to address load 
balancing for multidimensional datasets. 

1.1 Probability Distribution for Bin Selection 

The d-choice scheme can be characterized by a probability vector p — {pi,p2,P3, ■■■,Pn), where pi denotes the 
probability a ball falls in the i*^ most loaded bin. Here, the bins are ordered from the most loaded to the least loaded 
(ties are broken arbitrarily). Then, pi denotes the probability that the most loaded bin receives the current ball, p2 
denotes the probability that the second bin (in the order) receives the ball and so on. In general, in the d-choice 
scheme, pi — {-Y ~ (^— i)''. For d — 1,^1 : pi — l/n, while for c? > 1, pi > pj for i > j. Thus, for d > 1, the 
process has bias towards the lighter bins. This biasing leads to an overall lower gap for d > 1 choice as compared to 
single choice (d — 1) process. 

In this paper, we consider the multidimensional variant of the balls and bins problem. One multidimensional 
variant, proposed by |6| is as follows: Consider throwing m balls into n bins, where each ball is a uniform D- 
dimensional 0-1 vector of weight /. Here, each ball has exactly / non-zero entries chosen uniformly among all (^) 
possibilities (Fig. [T|(in the Appendix [A)). The average load in each dimension for each bin is given as mf/nD. 
Let l{a,b) be the load in the dimension a for the b*^ bin. The gap in a dimension (across the bins) is given by 
gap{a) = maxf, l{a, b) — avg{a), where avg{a) is the average load in the dimension a. The maximum gap across all 
the dimensions, max^ gap{a), then determines the load balance across all the bins and the dimensions. Thus, for the 
multidimensional balanced allocation problem, the objective is to minimize the maximum gap (across any dimension). 
We refer to the multidimensional ball as md-ball and the multidimensional bin as md-bin. 

In another variation of multidimensional balanced allocation the constraint of uniform distribution for populated 
entries is removed. Here again, each ball is a dimensional 0-1 vector and each ball has exactly / populated dimen- 
sions, but these populated dimensions can have an arbitrary distribution. In the third variation that is most general of 
the three, the number of populated dimensions, /, may be different across the balls, where / then is a random variable 
with an appropriate distribution. 

Mitzenmacher et.al. in |6 | addressed both the single choice and d-choice paradigm for multidimensional balls and 
bins under the assumption that balls are uniform _D-dimensional (0, 1) vectors, where each ball has exactly / populated 
dimensions. They show that the gap for multidimensional balls and bins, using the two-choice process, is bounded 

'http://en.wikipedia.org/wiki/Collaborative_filtering 
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by 0(loglog(n_D)). However, this result is not tight and assumes that / is polylog{n). Due to arbitrary number of 
dimensions and the resuhing discrepancy across the dimensions along with the general case of m > > n, the balanced 
allocations for multidimensional balls and bins is a challenging problem. In this paper, we compute bounds on the gap 
for the symmetric rf-choice process for multidimensional balls and bins for both sequential and parallel scenarios. 

1.2 Summary of Key Results & Techniques 

We present detailed analysis for the online sequential and parallel multidimensional balls and bins using the symmetric 
d-choice process and show that for n bins and m ~ 0{n) balls, the gap (assuming that exactly / populated dimensions 
are uniformly distributed over D per ball) achieved is 0(lnln(n)). We establish the first ever known bound for d- 
choice process and also show that this bound is tight (within D/ f factor) by providing the lower bound as well. This 
improves upon the best prior bound of 0(loglog(7i£')) |6|. For the general case of m >> n, the upper bound on the 
gap IS O((^)(i/2+0 Inln(n)) w.h.p., while the expected gap is still 0(lnln(n)). For non-uniform distribution with 
fixed / and for variable / with binomial distribution, we show that the expected gap is still independent of to. 

In order to arrive at these results, a novel generic potential function based approach along with sum load across 
the dimensions per bin is used. This is much more challenging than the analysis presented by lHOl for (1 + /3) -choice 
process, as we obtain a much tighter bound of 0(lnln(n)) (as compared to 0( '°^j"-* ) in fTO\). This requires a novel 
potential function as well as a much tighter analysis in each lemma to ensure that the expected value of the potential 
function is less than 0(ln(n)) at all time t and satisfies the super-martingale property. 

For parallel multidimensional balls and bins with multiple rounds using the d choice process, we show the upper 
bound on the gap as 0(loglog(n)) for to — 0{n) balls; and extend this bound on the gap to the general case of 
m >> n. This is tighter than the 0(loglog(nD)) that can be obtained using the analysis similar to 16]. 

For the weighted and heavy case (to >> n) using symmetric multiple choice sequential process for scalar balls, 
we prove an upper bound of 0{W* log(n)) (where W* is the expected weight of the distribution), which improves 
upon the best prior bound of 0(n'^) |[T2l . Our analysis technique also provides an alternate proof for the symmetric 
d-choice process with scalar unweighted to >> n balls into n bins, that is simpler and elegant as compared to |4|. 

Further, we present the analysis for bounds on the gap for (1 + /?) choice multidimensional process and prove that 
for TO = 0{n) the upper bound on the gap is 0( '°^^"'' ) w.h.p. for uniform distribution of / dimensions over the D 

dimensions. For non-uniform distribution with fixed / and also for variable / the expected gap is 0{ '"^^"-^ ) which is 
independent of m. Table [T] summarizes the comparison between our upper bounds and the best known prior bounds, 
with key results highlighted. 



Process: d-choice 


Best Prior Bound 


Our Bound 


Multidim, Fixed-f,m = 0{n) 


0{\og\og{nD)) |6| 


0(lnln(n)) 


Multidim, Fixed-f,m » n 


None 


0(lnln(n)) (expected) 


Multidim, Var-f 


None 


0(log(n)) (expected) 


Weiglited Scalar, ■m» n 


0{n'^) (short memory via coupling (125) 


0(log(n)) 


Unweighted Scalar, m >> n 


0(lnln(rt)) (using layered induction 


0(lnln(n)) 




and short memory |4|) 


(using potential function) 


Parallel Multidim, to — 0{n) 


0(log log(?T.£')) (adaptation of |6|) 


0(lnln(n)) 


Parallel Multidim, m » n 


None 


0(lnln(n)) (expected) 


Parallel Scalar 


0(lnln(n)) (for to = 0(n) ffl) 


0(lnln(r7,)) (for to >> n) 


Process: (1 + /3) -choice 


Best Prior Bound 


Our Bound 


Multidim, Fixed-f,m = 0{n) 


None 




Multidim, Fixed-f,TO » n 


None 


0('°s^^"^) (expected) 


Multidim, Var-f 


None 


0('°sj«)) (expected) 



Table 1: Upper Bound Comparison for d-choice and (1 + /?) Process 



2 Related Work 

Balls into bins is a well studied abstraction for load balancing problems. Numerous results are known for sequential 
(single dimensional) allocation case when to balls are thrown into n bins; such as: for multiple choice paradigm 
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the expected gap between the heaviest bin and the average load is 0{ °f^°^j" ) PI, (1 + /?) choice paradigm with 

Q^ iog(") ^ ifYo) 2s y^gjj for single choice paradigm having 0{\J^^^^^^^^) gap 121. lH] showed that the bound 

of Q( '°fo'g(^^""' ) for the symmetric d choice process is stochastically optimal, i.e. any other greedy approach using 
the placement information of the previous balls to place the current ball majorizes to their approach. However, if the 
alternatives are drawn from different groups then different rules for tie breaking result in different allocations. |f T3l 
presents such an asymmetric strategy and using witness tree based analysis proves that this leads to improvement in 
load balance to Q( ^^^J°^^^"j* ) w.h.p. where, (/)2 is the golden ratio and (pd is a simple generalization. 

The multiple choice and in particular the two-choice paradigm and balls-and-bins models have several interesting 
applications. In particular, the two-choice paradigm can be used to reduce the maximum search time in a hash table. 
Instead of using a single perfectly random hash function as in a typical hash table implementation (with maximum 
chain length as 0(ln(ri))), if we use two perfectly random hash functions, then the length of the longest chain reduces 
to 0(lnln(n)). In the latter case, when inserting a key, we apply both hash functions to determine the two possible 
table entries where the key can be inserted. Then, of the two possible entries, we add the key to the shorter of the two 
chains. To search for an element, we have to search through the chains at the two entries given by both hash functions. 
If n keys are sequentially inserted into the table, the length of the longest chain is 0(log log n) with high probability, 
implying that the maximum time needed to search the hash table is 0(loglog?i) with high probability. Further, the 
two-choice scheme can be advantageous in situations for example, when one hopes to fit one full chain in a single 
cache Une 0. The two-choice approach has also proven useful in online (dynamic) assignment of tasks to servers 
(disk servers, network servers etc). By using two-choice one would get much better load balance across the servers as 
compared to the single-choice approach. If we use (1 + /3) choice then, we would get around log(n) gap (as compared 
to 0(loglog(n) gap for the two-choice) but the communication cost to query the load of the servers will be lesser by 
(1 + /3)/2 factor as compared to the two-choice approach. 

Cole et al. [7J show that the two-choice paradigm can be applied effectively in a different context, namely, that 
of routing virtual circuits in interconnection networks with low congestion. They show how to incorporate the two- 
choice approach to a well-studied paradigm due to Valiant for routing virtual circuits to achieve significantly lower 
congestion. 

Kunal et.al. ifTOl present that for online sequential (1 + /3) choice process with n bins and m >> n balls, a tight gap 
can be obtained. They use a potential function based technique and further use a majorization argument 
to generalize their result. We present a novel generic potential function based approach with sum load function across 
all dimensions of a bin for multidimensional balls and bins and obtain tight bounds on the gap for the d-choice process 
for both sequential and parallel scenarios. Our analysis is much more challenging than |10| since we prove a tighter 
bound that requires a much tighter analysis in each lemma to prove that the expected value of potential function is less 
than 0(ln(n)) at all time t. Further, the lower and upper bounds for the (1 + /3) choice process with multidimensional 
balls and bins have also been provided in this paper 

Mitzenmacher et.al. in |6| address both the single choice and c?-choice paradigm for multidimensional balls and 
bins under the assumption that balls are uniform D-dimensional (0, 1) vectors, where each ball has exactly / populated 
dimensions. They show that the gap for multidimensional balls and bins, using the two-choice process, is bounded by 
0(log log(riD)). We provide better bound on the gap (0(log log(n))) and also provide the bound for the general case 
of m >> n. Further, while ||6l assumes that / is polylog{n) we don't make any such assumptions. For the multiple 
round multidimensional parallel balls and bins process where in each round, each bin accepts at max only a single ball, 
one can use layered induction based proof |6| to get a similar bound on the gap as 0(loglog(nD)). Using our novel 
potential function based analysis we show a tighter upper bound of 0(loglog(n)). The bound for the general case of 
m >> 77 is also provided. 

Berenbrink et.al. |4| prove an upper bound of 0(loglog(ri)) for the general case of m » n balls and n bins 
using a sophisticated analysis involving two main steps. In the first step, they show that when the number of balls is 
polynomially bounded by the number of bins the gap can be bounded by 0(lnln(n)), using the concept of layered 
induction and some additional tricks. In particular, they consider the entire distribution of the bins in the analysis 
(while in typical m — 0{n) case the bins with load smaller than the average could be ignored). In the second step, 
they extend this result to general m » n case, by showing that the multiple-choice processes are fundamentally 
different from the classical single-choice process in that they have short memory. This property states that given some 
initial configuration with gap A, after adding poly{n) more balls the initial configuration is, forgotten. The proof of 
the short memory property is done by analyzing the mixing time of the underlying Markov chain describing the load 
distribution of the bins. The study of the mixing time is via a new variant of the coupling method (called neighboring 
coupling). We prove the same result on the gap (0(log log(n))) for the symmetric d choice process with m >> n but 
by using a much simpler and elegant potential function based approach. 
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Kunal et.al. El) prove that for weighted balls (weight distribution with finite fourth moment) and m » n, the 
expected gap is independent of the number of balls and is less than rf, where c depends on the weight distribution. 
They first prove the weak gap theorem which says that w.h.p Gap[t) < t^^^. Since in the weighted case the d choice 
process is not dominated by the one choice process, they prove the weak gap theorem via a potential function argument. 
Then, the short memory theorem is proved. While in [4] the short memory theorem is proven via coupling, |12| uses 
similar coupling arguments but defines a different distance function and use a sophisticated argument to show that 
the coupling converges. Ifl2l also presents a reduction from the real-weighted case to the integer-weighted case. We 
present the results for weighted case (with integer and real weights and weight distribution with finite second moment) 
using an elegant and much simpler potential function based argument and show that the gap for arbitrary m » n 
is bounded by 0{W* log(n)), where W* is the expected weight of the distribution. Adler et.al. |r| consider parallel 
balls and bins with multiple rounds. They present analysis for 0{ ^°f^°^^^^ ) bound on the gap (for m = 0{n)) using 

^( '°fog'(^^)""' ^ 0{d)) rounds of communication. We generalize this result to the case of parallel multidimensional balls 
and bins and arbitrary m » n balls with similar bound on the gap. 



3 Symmetric (i-choice Process 

In this section, we present various results on the bounds on the gap using the symmetric d-choice process including 
unweighted sequential and parallel multidimensional balls and bins and the sequential weighted scalar case. 

3.1 Markov Chain Specification 

As mentioned earlier, a balls-and-bins process can be characterized by a probability distribution vector {pi,p2,P3, ■■■Pn), 
where, pi is the probability a ball is placed in the i*'* most loaded multidimensional bin. Let xf{t) be the random vari- 
able, that denotes the weight in dimension d for bin i and is equal to the load of the d*'* dimension of the j"* bin 
minus the average load in dimension d. So, X]r=i (^) = 0, Vd € [1..-D]. Each md-ball has / populated dimensions, 
where / could be constant across the balls or a random variable with a given distribution. Let, Si{t) denote the sum 
of the loads (minus corresponding dimension averages) across all D dimensions for the bin i at time t, expressed as 
= X^fci ^i - is assumed that bins are sorted by Siit). So, Si > Si+iVi G [l..n — 1]. The process defines a 
Markov chain over the matrices, x{t) as follows: 

• Sample j Ep [n]. 

• Set ri — Si(i) + / (I — 1/n), for i — j. Since, an md-ball has / non-zero entries , so each of these / dimensions 
in the bin, i, will be incremented by 1 — 

• Set ri = Si{t) — f /n, for i ^ j. Since, an md-ball has / non-zero entries, so the each of the corresponding / 
dimensions in the bin, i, will be decremented by 1 /n. This ensures that for each dimension the sum across all 
the bins is 0. 

• Obtain s(t + 1) by sorting r{t). 

Fig.[T|(in the Appendix [A) illustrates a multidimensional balls and bins scenario. The bounds on the gap will be 
proven for a family of probability distribution vectors p. As mentioned earlier, the md-bins are sorted based on their 
total dimensional load, i.e. sum of the weights across all dimensions for each bin (s^ for bin i). 

In the remaining analysis, we assume that when an md-ball arrives, then the selection of the bins is based on s^, 
i.e. total sum of weights across all dimensions for the randomly selected bins (Fig.[T]in AppendixfA). In particular, for 
the d — 2 choice process, when d bins are randomly selected, the md-ball (with / non-zero entries) is assigned to the 
md-bin with the lowest Si. Using this selection mechanism, we prove the upper and lower bounds on the gap obtained 
for the d choice process. Note that, this is a different allocation mechanism than that considered in f6\ where the max 
criteria is used over the restricted set of / populated dimensions in the current md-ball. Further, we prove upper bound 
for the case when m» n, while |l6l considers m = 0{n) case. The proofs below hold for even the case when d > 2, 
though we consider the case for d — 2 for sake of clarity. 

3.2 Upper Bound On Gap for Unweighted Case 

Let there be some constants, e > 0, > 1, 71 and 72, 73, 74, where, < 71 < 73 < 1/2 < 74 < 72 < 1 and dji ~ 1; 
7i + 72 ~ 1^ and 73+74 = 1. Since we consider the d-choice process, the probability of selecting the bins has strong 
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bias in favor of the lightly loaded bins. For d = 2, this results in the following: 

2n73 - 1 

P(»73) < 

2n74 — 1 



P(n74) ^ 



i2 



This implies thatt^j^^^^^^^j > (1 ^ tI) 'Uid J2i<{nfi)Pi — 7i- We assume that e < 1/4. Further, let a = e/2 f. 
In the analysis below, we assume each md-ball has exactly / populated dimensions (/ constant, fixed-f case). This is 
similar to the unweighted case with scalar balls. 

The md-bins can be arranged in a partial order, according to their values. Define an equi-load group (say p) as a 
set of bins with the same ,s ,; value. Define the potential of an equi-load group (p) as $(Gp) ~ X^Kfo ^ po+k ' ^^ere po 
is the beginning index for the group, \Gp\ is the size of the p"' group and e""'' ^ e^-^fc+i ^ V/c G [po..(po + \Gp\ — 2)]. 
The n bins are partitioned into disjoint equi-load groups (total \G\ groups), i.e. each bin is assigned to only a single 
equi-load group. The group structure defined here helps in characterizing the change in index of the bin that gets the 
ball (after sorting). 

Similarly, define another potential function for an equi-load group as, ^{Gp) = X^j^^o^ ''po+k • Now, define the 
following potential functions over aU the groups: 

|G| 

p=i 



|G| 

^{t) = ^{s{t)) = J2'^{Gp) 

T{t)=T{s{t)) = m+m 



(3.1) 



+ 



where, s,(t) = Ed=i4it) 

In the beginning, each dimension for each bin has weight, thus Si = 0, Vi and hence, r(0) = 2 ln(n). We show 
that if r(a;(t)) > a ln(n) for some a > 0, then E[T{t + 1) \x{t)] < (1 - 8n(i+e7i) ) * ^i^)- ^^^^ ^^^P^ demonstrating 
that for every given t, E[T{t)] e 0(ln(n)). This impUes that the maximum gap is 0(lnln(n)) w.h.p. 

First, consider the change in (also refers to $ by default) and ^{t) (also refers to ^ by default) separately 
when a ball is thrown with the given probability distribution. 

Lemma 3.1 When an md-ball is thrown into an md-bin, the following inequality holds: 

n 

E[$(t + 1) - mW)] < Y)Pi * + - a-/H-e""* (3.2) 

i=l 

Proof Let Aj be the expected change in $ if the ball is put in bin, i. So, ri{t + 1) = Sj + /(I — 1/n); and for j ^ i, 
rj{t + 1) = Sj{t) — f/n. The new values i.e. s{t + 1) are obtained by sorting r{t + 1) and $(s) = $(r). When, an 
md- ball is committed to bin i, then it moves to the end of the previous equi-load group or it creates a new equi-load 
group and hence can be located at index po (beginning location of its prior group) in the new sorted order of the bins. 
Thus, the expected contribution of bin, i, to Aj is given as follows: 



pa.(si+/(l-l/n)) „a.Si pa.Si 

E{- 1 = [e«-^(i-i/") - 1] 

Po Po Po 



Similarly, the expected contribution of bin, j (j ^ i) to Aj is given as: 



W ] - ^ = ^[e-"-//" 



Therefore, Aj is given as follows: 



Aj = $j[e"-/(^-i/") ^$j(e-"-//" - 1) 

= $je-"^/"(e"-^ - 1) + (e""-//" - 1).$ 
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Thus, we get the overall expected change in ^ as follows: 

n 

E[$(t+l)-$(t)|a;(t)] =^p,*A, 

71 

= J2p^ * [*«e^"^/"(e"-^ - 1) + (e-"-^/" - 1).$] 

i=l 
n 

= Y.P^ * e-"^/"$,(e"-/ - 1) + (e-"-//" - 1).^.$ 

i=l 

= * e-"-''/"(e"-^ - 1) + (e-"--''/" - 1)]$^ 



(3.3) 



Now, "■//") ,|; (e"-.'^ — 1) can be approximated as follows: 

e(-"-//«).(e"-/ - 1) < (1 - a.f/n + (a.f/nf) * (1 + a.f + (a.ff - 1) 

^a.f+{a.ff+0{{a.ff/n) 
e(-"-//«).(e"-/ _ 1) < (a.f + {a.ff ) 

Above, since, (a. is very small for large n, we have ignored the small terms. Similarly, (e^"-'^/" — 1) ^ —a.f/n 
Hence, the expected change in $ can be given by: 

n 

E[$(i + 1) - m\x{t)] < Y,[P^ * (a-/ + (a-/)') - a.f/n]<^, (3.4) 

Simplifying further and observing that $i decreases and Pi increases with increasing i from 1 to n, one gets the 
following Corollary. 

Corollary 3.2 E[$(t + 1) - $(t)|a;(t)] < (a./)^ * $/n 

Proof Since, pi are increasing and are decreasing, the maximum value taken by RHS of equation p.l6| l will be 
whenpj = 1/n for all i E [l..n]. Simplifying, we get the result. 

Similarly, the change in ^' can be derived. For detailed proof refer to Appendix [B| 
Lemma 3.3 When an md-ball is thrown into an md-bin, the following inequality holds: 

n 

E[*(t + 1) - ■^{t)\x{t)] < Y,[P^ * (-"■/ + ("•/)') + o^■f/n]^^ (3.5) 

1=1 

Further observing that pi > 0, one gets the following Corollary. 
Corollary 3.4 E[*(t + 1) - *(t)|.T(i)] < {a.f.^)/n 



In the next two lemmas. Lemma [33] and Lemma \il6\ we consider a reasonably balanced md-bins scenario. We 
show that for such cases, the expected potential decreases. Specifically, for S(„^2) ^ 0, the expected value of $ 
decreases and for S(„^^) > 0, the expected value of "if decreases. 

Lemma 3.5 Let $ be defined as above. If S(^n-y2)i^) < + < (1 - 

Proof From equation p.l6| l, we get. 



(3.6) 

■i<n72 i>n72 
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Now, we need to upper bound the term J2i<n'y2^Pi * ^'^■■f ~^ ('^•/)^)-^T~")- ^^'^^ Pi non-decreasing and 4>i is 
non-increasing, the maximum value is achieved when e"'^^ 1/*) — ^ for ^^ch i < 77,72. Hence, e"*' = ■ 

Hence, the maximum value is given as follows. 



1=1 ^ '^^ 1=1 (3.7) 
^ 272$ $ 
~ nln(n72) 'n? 

Similarly, one can upper bound the term, X]i>n72 (^'»^^^)- ^^^^^ Pi is non-decreasing and is non-increasing, 
the maximum value is achieved when e~"*' (^"^(-^^^-j — '^{>n-i2) for ^^^^ i ^ '^72- Hence, e"""' = 1^(1 /^^j ■ 
Hence, the required upper bound is given as follows. 

J=n72 4=n72 (3 g) 



< ^(>n72) < 



Thus, the expected change in $ can be computed, using equation p.6[) and the above bound, as follows: 



,2,, 272$ ,/ , ^ , , , , , ,^2^ .,. 2(1-72)$ 



E[$(i + 1) - mnt)] < {o^-f + («-/)^)( 1 , - ^) - * $ + (a./ + (a./)^) * 

nin(n72j nln(nj 

<(a/)-I-/rt( 1)1 '^^^^ I g/ ^'-'^^^^'''^ (3-9) 

~ ln(n72) 2nln(n72) nln(n) 

- 2n 

Lemma 3.6 Let * be defined as above. If S(n^^){t) > then, E[*(t + l)|a;(t)] < (1 - 

Proof The proof is similar to that of Lemma [33] See Appendix [C]for details of the proof. 

Now, we consider the remaining cases and show that in case the load across the bins , at time t, is not reasonably 
balanced, then for s„^2 > 0, either ^l* dominates $ or F < c, where, c = poly{l/e). 

Lemma 3.7 Let, S(„,^2) > and E[A$|x(i)] > — e$/47i. Then, either $ < £71 * ^, or F < c for some c = 
poly{\ / epsilon). 

Proof From equation p.l6| l, we get: 

n 

E[A$|a;(t)] < ^(p, * (a./ + {a.ff) - a.//n).$, 

i=l 

< ^ {p, * (a.f + (a.ff) - a.f/n).^, + (k * («./ + (a.ff) - a.f/n) * 

2i - 1, 



n'^i '111(7173) 

'<(n73) (3.10) 
■■^ n^i ln(l/73) n 

J>(n73) 

< "^^^"^^ - 1 - 1/n] + "^'^>"^^ [ - 1 - 1/n] 

n ln(n73) n 111(1/73) 

^ af<^<njs r 273 - 111(7173) , a/'^>«73 r 274 - ln(l/73) i 
71 ^ 111(7173) ^ ^ ln(l/73) ^ 

Now, since E[A$|a;(0] > -a/$/27i, we get: $ < 4$(>„^3) [ .^^^^^J-ff^^^^ ]. Let, S = ^.^ 7nax(0, s,)- Note, 
■^i = 0' since for each dimension d, the update maintains that, J2d=i ^fi^) ~ ^- Further, because, s„^3 > 0, 
$(>„73) < ln(l/73) * e^^\ This impUes that, $ < '^^'ff^p' - 



Since, S(„72) > 0' ^ — l'^("-72) * e""'i . If $ < 571 * ^, then we are done. Else, $ > £71 * This implies: 

474 ln(7i)e"^3 



ln(n73) 



> $ > £71 * > £71 ln(n72) * e"^i 



Thus, e"-^/" < (474)(^P%) . So, r < (i±^ * $ < * 474ln(n) ^ ^ < 1+9 ^ 474ln(n) ^ ,474)1^^^ y 

' — ^ eji ^ ' — V e — e ln(n73) — e 111(7173) ^£71'^ 

r < c, where, c = poly{l/e). 

In the Lemma below, we consider the case where the load across the bins at time, t, is not reasonably balanced, 
and S(„7i) < 0. Here, we show that either $ dominates ^ or the potential function is less than c for c = poly{l/e). 

Lemma 3.8 Let, S(„^j) < and E[A'^/\x{t)] > —af'i/Sn. Then, either'^ < eji * $, or F < cln{n) for some 
c = poly{\ / epsilon). 



Proof The proof is similar to that of Lemma 3.7 See Appendix [D] for details of the proof. 



Now, we consider combinations of the cases considered so far and can show that the potential function, F, behaves as 
a super-martingale. 

Theorem 3.9 For the potential function, F, E[F(t + < (1 — 2471(1+671) )-^(^) + ''n ' constant c = 

poly{l/e). 

Proof We consider the following cases on intervals of values for s^. 



• Case 1: Si^n-yi) > and S(„72) < 0. Using, Lemma 
E[F(i + l)\x(t)] < (1 - £Vl6n)F(<) and hence, the resu 



t is also true. 



3.5 and Lemma 3.6 we can immediately see that. 



Case 2: s„^j > s„^2 > 0. This represents a high load imbalance across the bins. In some cases, $ may grow 
but the asymmetry in the load implies that F is dominated by ^I^. Thus, the decrease in ^ offsets the increase in 
$ and hence the expected change in F is negative. 



Specifically, if E[A$|x] < ^ff^, then using Lemma 3.6 we get that E[F(t+ < (1 - a//8n)F(t); else 
we consider the following two cases: 

- Case 2a: $ < £71 * vf. Here, using Lemma [33] and Corollary |3.2[ we get: 

E[AF|x] = E[A$|a;] + E[A'J'|a;] 

< ^ — ^ * * 

n 8n 

<-^.* 

£ „ 

< 



24n(l + £71) 



- Case 2b: F < cln(n). Here, using Corollary 3.4 and Corollary |3.2| we get: 

E[AF|a;] < a.//n*F < ^ 

n 

But, cln(n)/n-((£/47i)*F) > cln(n)/?i(l-£/4) > cln(7i)/n(l-£/2) > £2^M!i). Hence,E[AF|a;] < 

£r I cln(n) 
4n n ■ 



Case 3: s„^2 < Sn-y^ < 0. Here, if E[A^'|a;] < jg^*, then using Lemma 3.5 we get that E[F(t + l)|a;(t)] < 
(1 — £/16n)F(t); else we consider the following two cases: 



- Case 3a: 4" < £71 * $. Here, using Lemma [33] and Corollary |3.4| we get: 

E[AF|a;] = E[A$|a;] + E[A*|j;] 

< -(£/4n)<f> + a.f/n * * 

< -(£/4n).$ + (7i£^/2n)$ 

<^<1> 

- (87i(l + £7i)) *^ 
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- Case 3b: F < cln(ri). Here, using Corollary |3.2| and Corollary |3.4| we get: 

E[AT\x]^a.f/n*T <ca.f\n{n)/n 
Hence, this case follows similarly as Case 2b above. 
Now, we can prove using induction that the expected value of F remains bounded. 
Theorem 3.10 For any time t > 0, E[T{t)] < ln(n) 



Proof Using induction we can prove this claim. For t — 0,it is trivially true since F(0) = 2 ln(n). Using Theorem 3.9 
we get: 

E[T{t + l)] = E[E[Tit + l)\Tit)]] 

< E (1 -, -)T(t) + ^ 

- 24n(l + e7i)^ ^ ' n ' 

24c(l + e7i), , , ln(n) cln(n) 

< — ln(n) - c^-^ H — 

e n n 

24c(l + £7i) ^ 

Now, we can upper bound the gap across all the D dimensions across all the n md-bins. This gap is defined as follows: 

Gap(t) = nmx[maxx^] (3.11) 

d=l 1=1 

Theorem 3.11 Fixed f Case: Using the bias in the probability distribution in favor of lightly loaded md-bins as 
given by the d-choice algorithm, and assuming that f dimensions are exactly populated in each md-ball with uniform 
distribution of f dimensions over D, the expected and probabilistic upper bound on the gap ( maximum dimensional 
gap) across the multidimensional bins is given as follows. Let, 5 — ^'^'^(^+'^"''1) ^ then: 

E[Gap{t)] < 21oglog(n)/e + 2 loglog(5)/e 

Pr[Gap{t) > C^f^^^ * (41oglog(n)/e + 41og(log<5)/e)] < D/mf 
nD 

Pro of Le t, a be the winning md-bin and m be the winning dimension that represents Gap{t). Now, from Theo- 
we get, E[e°-^°] < 6\og{n). So, E[e°-'^"+^<'?^™ ''»^] < S\og{n). Let, Ua denote the gap as measured by 



3.10 



rem 

the number of md-balls in bin a minus the average number of balls across the bins. Then, 

E[sa] < 1/a * loglog(n) + 1/a* loglog((5) 
^ E[s,] < 2/ log log(n)/e + 0(2/ log \og{S)/e) 
f.E[ya] < 2/ log log(n)/e + 0(2/ log log((5)/e) 
E[ya] < 21oglog(n)/e + 0(21oglog(5)/e) 

The third inequality uses the fact that each ball has exactly / populated dimensions. Since the / dimensions are chosen 
uniformly and randomly from D dimensions, the expected gap in any dimension (and hence the winning dimension 
with the maximum gap) is bounded by 0(log log(n)). Now, consider the case of a non-uniform distribution, where 
we assume that each dimension is chosen with probability at most K2 in each md-ball and each md-ball still has fixed 
/ populated dimensions. Here, one can see that the expected gap can be bounded by 0{k2 log log(ri)). 

Now, the Pr[sa > 4/ loglog(n)/e + 4/ log log((5)/e] < Pr[F(t) > nE[r{t)]] < 1/n (using Markov's Inequal- 
ity); where Sa — ^d=i ^a- Further, the probability that within a single md-bin, a particular dimension has more than 
the expected number of Is, can be given by the Chernoff Bound as follows. Let m/n balls be thrown into an md-bin. 
The number of ones in any dimension follows a Binomial distribution, B{m/n, f /D). Using Chernoff Bound, and 
assuming t ~ (^)^^^^''' we have: 

Pr[Bim/n, f/D) > {mf/nD + t)] < (-J^^/^)™//"^+* * e* 
Pr[B{m/n, f/D) > {mf/nD + t)] < nD/mf 
Hence, Pr[ya > (^)i/2+C ^ (4ioglog(n)/e + 41oglog((5)/e)] < l/n ^ nD/mf ^ D/mf . 
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3.3 Lower Bound for Unweighted Case 



We can show that the expected upper bound, for fixed / case with uniform distribution, proved in section 3.2 is tight to 
within f /D factor. Consider the case when, m balls are thrown into n bins, using the d choice process. The expected 
dimensional sum load per bin is fm/n. Berenbrink et.al. [4J show that when m » n balls are thrown using the d 
choice process into n bins, then the load of the most loaded bin is at least 0(lnln(n)) balls more than the average 
m/n. Thus, for md-balls the sum load of the most loaded md-bin is at least Vl{f Inln(n) + fm/n). Since, each ball 
has / populated dimensions, hence, there are at least ri(lnln(n) + m/n) balls in this max sum load bin. Since, in 
each ball / dimensions are uniformly distributed over D dimensions, there exists a dimension whose load is at least 
Vl{f Inln(n) / D) more than the average mf /nD. Hence, the lower bound is 0{f Inln(n) /D). 

3.4 Parallel Multidimensional Balls & Bins: Unweighted Case 

Consider the following parallel d-choice process. Let m balls be thrown in parallel using d-choice process into n bins. 
In each round, a bin sends the (ball's) rank to the ball with the lowest ID. The ball chooses the bin (out of d bins it 
selected) that gives the lowest rank. It can be shown that this parallel process produces exactly the same distribution 
of balls in the bins as a sequential Greedy with Ties process [1]. In the sequential Greedy with Ties process, when there 
are multiple bins with same lowest load, all of these bins get the ball. Using the potential function analysis as above, 
we can show that the gap in this case, can also be bounded by 0(loglog(7i)). We provide an overview of the proof 
below. Consider the change in $(t) (also refers to $ by default) and (also refers to 'J by default) separately when 
a ball is thrown with the given probabiUty distribution. 

Lemma 3.12 When an md-ball is thrown into an md-bin, the following inequality holds: 

n 

E^t + 1) - < * P^ * {a.f + {a.f )^) - da.f/n].e"-'' (3.14) 

i=l 

Proof In the Greedy with Ties process, some number (less than d) of bins each with the same load (and hence 
belonging to the same equi-load group) can get the (replicated) ball. In the worst case all the d randomly selected bins, 
chosen by the ball, have the same load and hence get the md-ball. All of these md-bins, then move to the previous 
equi-load group or a new equi-load group is created. Let A be the expected change in $ when the ball is put in a 
certain number (less than d) of bins. If one of these bins is i, then, ri{t + 1) = Si + /(I — d/n). For bins j ^ i, that 
do not get the md-ball, rj{t + 1) = Sj{t) — df /n. The new values i.e. s{t + 1) are obtained by sorting r{t + 1) and 
$(s) — $(r). Thus, the expected contribution of bin, i, to A is given as follows: 

a.{si + f{l-d/n)} a. Si a.Si 
E[- ] ~- = [e^-fi^-d/n] _ ^1 

i i i 

Similarly, the expected contribution of bin (that does not get the ball), j (j 7^ i) to A is given as: 

a.{sj-df /n) „a.Sj pO.Sj 

E[ ] - = [e-'^-df/n _ ^] 

j j j 

Assuming that the bins that get the replicated ball are ii, Z2, ..i^, A is given as follows: 

A = (i$»[e"-^'(i-'^/") - 1] + *j(e^"-''-^/" - 1) 

= d$,e-"'^^/"(e"-'^ - 1) + (g-"-''//" - 1).$ 
Thus, we get the overall expected change in $ as follows: 



E[$(i + 1) - ^t)\x{t)] = * A 

i=l 
n 

= J2P^ * [d^ie-""^f/"{e°'-f - 1) + (e-"--^''*/" - 1).$] 

X' (3-15) 

i=l 
n 

= Yy^^ * de-"-'^^/"(e"-^ - 1) + (e""-'*//" - 1)]4>, 
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Now, °'-f<i/n) ^ ^g" / _ 2) can be approximated as follows: 

g(-a.d//n) (ga./ _!)<(!_ a.df/n + {a.df/nf) * (1 + a.f + (a.ff - 1) 

^a.f + ia.ff + Oiia.dff/n) 
e(-"-'i//»).(e«-/ _ 1) < {a.f + {a.ff ) 

Above, since, {a.fd)^/n is very small for large n, we have ignored the small terms. Similarly, (e"" '^-'^/" ~ 1) ~ 
—a.df/n Hence, the expected change in $ can be given by: 

n 

E[$(i + 1) - <i>{t)\x{t)] < Y,[P^ * d{a.f + {a.ff) - a.df/n]^, (3.16) 

i=l 

Similarly, one can show the following. 

Lemma 3.13 When an md-ball is thrown into an md-bin, the following inequality holds: 

n 

E[*(t + 1) - ■^{t)\x{t)] < Y,[d * P^ * {-a.f + {a.ff) + da.f/n].e-''-'' (3.17) 

i=l 

Following similar lines of proof as for the sequential multidimensional case, one can hence show that: 



Theorem 3.14 For any time t > 0, E[r{t)] < 1+^71) ^^g^^^ 



Thus, this parallel balls and bins process with m >> n balls and n bins, takes 0{— + loglog(n)) rounds and 
results in maximum bin load 0(— + loglog(n)) resulting in upper bound on the gap of 0(loglog(n)). Hence, 
one can derive that the gap (using Theorem |3.11| i for the multidimensional parallel scenario is also bounded by 
0((m//L»)(i/2+^' loglog(n)) with high probability. 

3.5 Upper Bound On Gap: Weighted Case 

Here, we consider the case when the multidimensional balls have variable number of populated dimensions, /. The 
sum of dimensional load in an md-ball, /, is thus a random variable. We assume that the distribution for / has a finite 
second moment and average value, /*. For this distribution, we assume that there is a A > such that the moment 
generating function A/[A] = E[e^-f] < 00. Note that Af"(z) = i?[/^e^'^] < E[f'^]E[e'^''f]. The above assumption 
implies that there is a S* > 1, such that for every \z\ < A/2 it holds that M"{z) < 2S. Our analysis below is primarily 
for integer valued / and for the multidimensional case. However, it can be easily seen that similar analysis holds for 
scalar balls and bins with real valued weight per ball W and = 1 (still assuming that the distribution of W has 

finite second moment). 

The weighted case is more challenging that the unweighted case, since we have to carefully consider the change in 
the rank of a bin when an md-ball of total dimensional load (weight) / falls in it, as the change in rank could increase 



the potential by a large amount. Thus, the potential function used in section 3.2 might not work in this case and we 
need to devise a new one. Assume that e < 1/4. Further, let a — min (j^, j, fj^)- Define the following potential 
functions over the bins: 

^t) = <^>{s{t)) = j24^ 

i—l 

m = nm) = J2 — ^ 

^n-' + n-i + l (3.18) 

T{t) = r{s{t)) = m + 'fit) 



-\- i ^ n — i -\- 1 

i—l 



where, s,{t) = 

In the beginning, each dimension for each bin has weight, thus Si = 0, Vi and hence, r(0) < 2{n/ {n^ + 1)) < 
2/n. We show that if r(a;(t)) > a/n for some a > 0, then E[r{t + l)\x{t)] < (1 - i6n°i+e7i) ) * This helps in 
demonstrating that for every given t, E[r{t)] e 0{l/n). This implies that the maximum gap is 0(log(n)) w.h.p. 
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First, consider the change in <J>(t) (also refers to <J> by default) and ^{t) (also refers to by default) separately 
when a ball is thrown with the given probability distribution. Let there be constants, < 71 < 72 < 1/2 < 74 < 73, 
such that 72 + 73 > 1 and 71 + 74 < 1 and 72 < 7/16 

Lemma 3.15 When an md-ball is thrown into an md-bin, the following inequality holds: 

n 

E[i>(t + 1) - ^{t)\x{t)] < Y^[pi * {a.f* + 1/n + Sa^) - a.f /n].e°'-'^ (3.19) 

1=1 

Proof Let be the expected change in <i> if the ball is put in bin, i. So, ri{t + 1) = Si + /(I — 1/n); and for j ^ i, 
rj{t + 1) = Sj{t) — f /n. The new values i.e. s{t + 1) are obtained by sorting r{t + 1) and $(s) = $(r). When, 
an md-ball is committed to bin i, then it jumps to an index inew which is less than or equal to i in the new bin order. 
Thus, the expected contribution of bin, i, to A; is given as follows: 

Q.(si+/(l-l/n)) a. Si 

E[- 



^ e"-"^ M{a{l-l/n)){n'^ + n) 
~ n'^ + i n'^ + 1 

< $4(M(0) + M'(0).a(l - 1/n) + M"(0)(a(l - l/n))^){l + 1/n) - 1] 

< $4(1 + f*a{l - 1/n) + Sa')il + 1/n) - 1] 

•.• Af(0) = 1,M'(0) = E{f) = f*,M"{0) < 25 

< $«[/*a(l - 1/n) + Sa^ + (1 + f*a{l - 1/n) + Sa^)/n] 

< $,[/*a + l/n + 5a2] 

(3.20) 

The bins that were at index j € [i„e^..(z — 1)], shift right by one position and hence the expected contribution of such 
a bin, j to A,; is given as: 

Q.(sj-//n) a.Sj 

- 4 : 

n2 + J + 1 ^ n2 + J 

< ^^[Af(-a/n)-l] 

< $,[Af(0) + Af'(0)(-a/n) + M"(0)|^] 
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n n-' 

For all other bins, their rank does not change in the new bin order, hence, their expected contribution to Aj is given as: 



+ j n? + i 

-f*a Sa^, 



(3.22) 



Using equations ( |3.20| i, ( |3.21| i and ( |3.22| i, A^ is given as follows: 

A, = {-a.f/n + 1/n + Sa^)^^ + 

n 

Hence, the expected change in $ can be given by: 

n 

E[$(t + 1) - Ht)\x{t)] < J2[P^ * ("•/* + !/"• + Sa^) - a.r/n]^i (3.23) 

1=1 

Simplifying further and observing that decreases and pi increases with increasing i from 1 to n, one gets the 
following Corollary. 



Corollary 3.16 E[$(t + 1) - ^{t)\x{t)] < {a.f* + 2Sa^) 



2\ * 
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Proof Since, pi are increasing and $i are decreasing, the maximum value taken by RHS of equation p.23| l will be 
when e" "' * V" , t^-^ = $. Thus, e""-'' = n^. Hence, 

n n . 

i=i »=i ^ (3,24) 

$ 2?i- 1 

< — * 

n n + 1 

Using, equation p.23| l, we get: 

(2n-l)$ 



E[$(t + 1) - < (a./*(l + 1/n) + 1/n + Sa^ 



n{n + 1) 



< (a.f* +2Sa^)- 
n 



(3.25) 



Similarly, the change in ^ can be derived. For detailed proof refer to AppendixjE] 
Lemma 3.17 When an md-ball is thrown into an md-bin, the following inequality holds: 

E[*(t + 1) - < * {-a.f* + ^ + a-/7«]*. (3-26) 

Further observing that pi > 0, one gets the following Corollary. 

Corollary 3.18 E[*(t + 1) - 'i'{t)\x{t)] < ia.f*^)/n 

In the next two lemmas. Lemma |3.19| and Lemma 3.20 we consider a reasonably balanced md-bins scenario. 
We show that for such cases, the expected potential decreases. Specifically, for S(^nj2) — '■^^ expected value of $ 
decreases and for S(„^j) > 0, the expected value of "if decreases. 

Lemma 3.19 Let $ be defined as above. //'s(„^2)(*) < f'^^"' E[^{t + l)|a;(i)] < (1 - 
Proof From equation ( |3.23| l, we get, 

n 

E[$(t + 1) - a>W|x(t)] < J2iP^ * i^f* + S{am - a/7")-*.: 

^<n72 i>n72 

Now, we need to upper bound the term J2i<n-y2^P'' * ('-'^•/* + Since is non-decreasing and (E>i is non- 

increasing, the maximum value is achieved when e""' Y^"'', , ^^ = for each i < 7172. Hence, e"** = 
Hence, the maximum value is given as follows. 

^ ci>(n + 72) ^,2i-l 1 , 

72 ^ + z 

2—1 1—1 

^ + ^ 72(2^72 - 1) (3.28) 
~ 7^272 (ri + 72) 

^ (2n72 - 1)$ 

Similarly, one can upper bound the term, J^iyn-y-, (Pi (n^+i) ) ■ Since pi is non-decreasing and <i>i is non-increasing, the 

maximum value is achieved when e"^' (X]iL(„^2) :f[T:fi) = *J'(>n72) f^J" ^^ch i > 7172. Hence, e"^' = '^'"'^j'^'^")^^^'' . 
Thus, the expected change in $ can be computed, using equation p.27| i and the above bound, as follows: 

E[Aa>|x(ijJ < [a. J + ba ) ^3 * "P + (a.j + ba ) * — 



< 



< 



n?{n + 1) 
2a./*72$ 

n n 
(272 - 

n 



(3.29) 



<^|{^V72<(l/2-l/16) 
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Lemma 3.20 Let ^ be defined as above. Ifs(^n-a)it) > then, E[v['(t + < (1 - ^5^)* 

Proof The proof is similar to that of Lemma 3.19| See Appendix|F]for details of the proof. 



Now, we consider the remaining cases and show that in case the load across the bins , at time t, is not reasonably 
balanced, then for s„^2 > 0' either ^I^ dominates $ or F < c/n, where, c — poly{l/e). 

Lemma 3.21 Let, S(„^2) > and E[A^\x{t)] > -a./*$/8n. Then, either ^ < e-^i * "i, or T < ^ for some 
c = poly{l / epsilon). 

Proof From equation p.l6| l, we get: 

n 

E[A$|x(<)] < ^(p, * (a./* + So") - a./7n).$, 

1=1 

< ^ (p. * (a./* + Sa^) - a.r/n).^, + Y.^P-* + '^"') " * 

i<n73 ■i>n73 

»<(n73) (3 30) 

^ 1-73 ^ ^2(772+1) n 

i>{nj3) 

^ a/*$<„T,3 (2?i73 - 1) ^ a/*$>„^3 73 a./*$ 



n- + 73 1 - 73 n + 1 n 

af*^ f 273 2^73-1 73 

( — ^ 1)+"/ *>»73 7 — ^ ^ + n ^7 — 

n n + 73 n-(»^ + 73) (l-73)(«+l) 



Now, since E[A$|x(t)] > -a/$/8n, we get: $ < 4n$(>„^3) [ 45^+373) ]■ Let, B = E» '^aa;(0, s,). 
Note, Sj = 0, since for each dimension d, the update maintains that, X^dLi ^fi^) — 0- Further, because, Sny-^ > 0, 

$(>„73) < ^ * e^^^- This implies that, $ < tiZlll'Knt^^y 

a-B 

Since, S(„72) > 0, so, ^I* > ^* ^{^-^-,2) . If $ < g^j^ ^ vj/^ then we are done. Else, $ > £71 * ^. This implies: 

4(n - 2)73 ^ ^ ^ „^ ^ "£71 



e""'3 7- r- r > $ > £71 * v[/ > * e*"- ""'2) 

(4n + 373)(n + 73) 72 

Thus, e"-^/" < ( 4(n-2)7372 ) (1,^74 -^D . So, F < * $ < * !'3"",'/l , * 6^ 

' — Vne7i(4n+373)(n+73) ' — e — e (4n+373)(n+73) 

Hence, F < c/n, where, c = poly{l/e). 

In the Lemma below, we consider the case where the load across the bins at time, t, is not reasonably balanced, 
and S(„-yj) < 0. Here, we show that either $ dominates 'i' or the potential function is less than c/n for c = poly{l/e). 

Lemma 3.22 Let, S(„7i) < and E[A^^\x{t)] > ~a./*4'/2n. Then, either "ii < £71 * or F < c/n for some 
c = poly {I / epsilon). 



Proof The proof is similar to that of Lemma 3.21 See Appendix[G]for details of the proof 



Now, we consider combinations of the cases considered so far and can show that the potential function, F, behaves as 
a super-martingale. 

Theorem 3.23 For the potential function, T,E[T{t+l)\x{t)] < {1~ ^q^^:^/^^^^^j )T {t) + , for constant c — poly{l/e). 
Proof We consider the following cases on intervals of values for s^. 



Case 1: S(„7i) > and S(„^2) < 0. Using, Lemma 3.19 and Lemma 3.20 we can immediately see that, 
E[F(i + l)\x{t)] < (1 - a.f*/8n)r{t) and hence, the result is also true. 
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Case 2: s„-yi > Sri72 > 0- This represents a high load imbalance across the bins. In some cases, $ may grow 
but the asymmetry in the load implies that F is dominated by vj/. Thus, the decrease in offsets the increase in 
$ and hence the expected change in T is negative. 



Specifically, if E[A$|a;] < then using Lemma 3.20 we get thatE[r(t + l)\x{t)] < (1 - af/8n)r{t); 

else we consider the following two cases: 



- Case 2a: $ < £71 * Here, using Lemma [3. 20| and Corollary |3.16| we get: 

E[Ar|a;] = E[A$|a;] + E[A*|a;] 
a.f* ^ af* ^ 
n 2n 



< 



4n 



< r 

4n(l + £71) 



- Case 2b: F < c/n. Here, using Corollary 3.18 and Corollary 3.16 we get: 



„r . 1 a.f* „ ca. f* 
E[Ar|a;] < * L < — ^- 



But, c/n2 _ ((a./*/8n) * T) > c/n^{l - a.f* /S) > c/n^{l - a.f* /2) > 
Hence,E[Ar|a;] <-^^ + f. 



Case3: s„^2 < s„^i < 0. Here, if E[A^'|a;] < then using Lemma 3. 19 wegetthatE[r(t+l)|a:(t)] < 

(1 — a./*/8n)r(i); else we consider the following two cases: 



- Case 3a: 5* < £71 * $. Here, using Lemma [3.19| and Corollary |3.18 we get: 



E[Ar|a;] = E[A$|a:] + E[A*|x] 

< -(a./78n)$ + a.f* In * ^ 

< -(a./78n).$ + (7i£a./7n)$ 
-a.f* 



< 



16n 



< 



-a.f* 



16n(l + £7i) 



- Case 3b: T < c/n. Here, using Corollary 3.16 and Corollary 3.18 we get: 



EfArlx] = a./7n*r < 
Hence, this case follows similarly as Case 2b above. 
Now, we can prove using induction that the expected value of F remains bounded. 
Theorem 3.24 For any time t > 0, E[F(i)] < 



Proof Using induction we can prove this claim. For i = 0, it is trivially true since F(0) < 2/n. Using Theorem 3.23 
we get: 

E[r{t + i)] = E[E[T{t + i)\rit)]] 

< E[(l ^ -)T(t) + -1 

- 16n(l + £7i)'^ ^ ' n^ 



< 



< 



16c(l + £71) c 
na.f* n^ 

16c(l + £7i) 
na.f* 
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Theorem 3.25 Variable f Case (Weighted Case) Gap: Using the bias in the probability distribution in favor of lightly 
loaded md-bins as obtained from the d-choice process, and assuming that in each ball, each dimension is chosen as 
1 with probability q (variable f case); the expected and probabilistic upper bound on the gap (maximum dimensional 
gap) across the multidimensional bins is given as follows. Let, 5 = cmd C > 0, then: 

E[Gap{t)] < 2q\og{n)/e + 2qlog{6)/e 

Pr\Gap(t) > (mq/n)^/^+'^ (iqlogin) /e + 4:q\og(S) /e]) < — 

qm 

Proof Since, each dimension is assigned 1 with probability q, the average number of ones per md-ball is /* — Dq. 
Let, a be the winning md-bin and m be the winning dimension that represents Gap{t). The number of ones in any 
ball, /, follows a Binomial(Z?, q) distribution and has finite second moment. Using the analysis for the weighted balls 
case, we get, E[e"*°] < nS, where a.f* < e/2. So, E[e"'^^°'"'"^''^'" '^''-'] < nS. Taking, logarithm of both sides, we 
get: 

EK^ + ^ E[xi] < log(n)/a + log(<5)/a ^3 31) 

If k is the expected number of balls were thrown in bin a minus the average number of balls per bin, then E [a;™] — kq 
and similarly, J^d^m ^[^a\ = ~ l)kq. Hence, we get: 

Dkq<2f* login) /e + 2f*\og{S)/e 
^k< 21og(n)/e + 21og(5)/e 
^E[x:']<2q\og{n)/e + 2q\og{d)/e 

The probabilistic bound can be computed similar to the fixed / case (Theorem |3.1 1[ ) using the Chernoff bound. □ 

Note that the for the scalar case, when the expected weight of the distribution is W*, the upper bound on the gap 
obtained is 0{W* log(rt)), which after normalization, i.e. E{W) — W* — 1, leads to 0(log(n)) gap. This improves 
upon the best prior known bound of 0{n'^) given in flT]. 

4 (1 + /3) Choice Process with Multidimensional Balls and Bins 

In this section we present upper and lower bounds on the gap for the {1 + f3) choice process with multidimensional 
balls and bins. 

4.1 Markov Chain Specification 

As mentioned earlier, a balls-and-bins process can be characterized by a probability distribution vector {pi,p2,P3, ■■■Pn), 
where, pi is the probability a ball is placed in the i*'* most loaded multidimensional bin. Let xf{t) be the random vari- 
able, that denotes the weight in dimension dfor bin i and is equal to the load of the d*^ dimension of the i*'' bin minus 
the average load in dimension d. So, Y^^=i ^fit) = 0, Vd G [l..!)]. Let, Si{t) denote the sum of the loads (minus 
corresponding dimension averages) across all D dimensions for the bin i at time t, expressed as Si{t) = J2d=i ^t- 
is assumed that bins are sorted by Si{t). So, Si > Si+iVi G — 1]. The process defines a Markov chain over the 
matrices, x{t) as follows: 

• Sample j Gp [n]. 

• Set ri — Si{t) + /(I — for i ~ j. Since, each md-ball has / non-zero entries , so each of these / 
dimensions in the bin, i, will be incremented by 1 — 1 /n. 

• Set ri = Si{t) — f /n, for i ^ j- Since, each md-ball has / non-zero entries, so the each of the corresponding / 
dimensions in the bin, i, will be decremented by 1/n. This ensures that for each dimension the sum across all 
the bins is 0. 

• Obtain s{t + l)hy sorting r{t). 

Fig. [T] (in the Appendix [A| illustrates a multidimensional balls and bins scenario. The bounds on the gap will be 
proven for a family of probabiUty distribution vectors p. As mentioned earlier, he md-bins are sorted based on their 
total dimensional load, i.e. sum of the weights across all dimensions for each bin (si for bin i). We make the following 
assumptions: 
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• Vi G [1, ?i — < pi+i This assumption states that the allocation rule is no worse than the 1-choice scheme. 

• For some constants, e > 0, 6* > 1 and < 73 < 74 < 1, where 73 + 74 = 1, it holds that: 

Pinjs) < ^ . and, p(„^,) > ^ (4.1) 

This assumption states that the allocation rule strictly prefers the least loaded 73 fraction of the n bins over the 
most loaded (1 — 74) fraction. 

These assumptions imply that for some constants, 71 and 72, where, < 71 < 73 < 1/2 < 74 < 72 < 1 and 
071 = 1; 7i + 72 = 1, we have the following: 

J2i>(n-y2)P'' - ("^1 + ^) and Y.i<(n'yi)Pi ^ (Ti " e)- This will be useful in the proof. Note that the (1 + /3) 
choice process satisfies these assumptions for e = /3(1 — 273)/^?, since P(„-,3) < (1 — + 2(7173 — l)f3/v? < 
(1 - /3(1 - 273))/n, and similarly p(„^,) > (1 + /3(274 - l))/n. 

In the remaining analysis, we assume that when an md-ball arrives, then the selection of the bins is based on Sj, 
i.e. total sum of weights across all dimensions for the randomly selected bins (Fig.[T]in Appendix[A|. In particular, for 
the (1 + /3) choice process, when two bins are randomly selected (with /? probability), the md-ball (with / non-zero 
entries) is assigned to the md-bin with the lowest s^. Using this selection mechanism, we prove the upper and lower 
bounds on the gap obtained for the (1 + /3) choice process. Note that, this is a different allocation mechanism than 
that considered in fS) where the max objective is considered over the restricted set of / populated dimensions in the 
current md-ball. 

4.2 Upper Bound On the Gap 

We assume that e < 1/4. Further, let a = e/2f. Define the following potential functions: 

n 
i=l 

" (4.2) 

*(t) = *(,s(t))=^e— 

1=1 

where, s^{t) = 'Ed=i^f(t) 

In the beginning, each dimension for each bin has weight, thus Si — 0, Vi and hence, r(0) = 2n. We show that 
if r(x{t)) > an for some a > 0, then E[T(t + l)\x{t)] < (1 - 4^^(\~^^^'^) ) * T(t). This helps in demonstrating that for 
every given t, E[r(t)] € 0{n). This implies that the maximum gap is 0(log(n)) w.h.p. 

First, consider the change in $(i) (also refers to $ by default) and '^^{t) (also refers to ^l* by default) separately 
when a ball is thrown with the given probability distribution. 

Lemma 4.1 When an md-ball is thrown into an md-bin, the following inequality holds: 

n 

Em + 1) - m\<t)] < Y.^P^ * (a-/ + (a-/)') - a.//n].e"-^' (4.3) 

2 = 1 

Proof Let Aj be the expected change in <I> if the ball is put in bin, i. So, ri{t + 1) = + /(I — ^/n); and for j 7^ i, 
rj{t + l) — Sj{t) — f /n. The new values i.e. s(t + l) are obtained by sorting r(i + 1) and$(s) = $(r). The expected 
contribution of bin, i, to is given as follows: 

]E[ga-(s»+/(l-l/n))j _ gQ.Si „ gQ.Si jga./(l-l/n) _ 

Similarly, the expected contribution of bin, j (j 7^ i) to A^ is given as: 

Therefore, A^ is given as follows: 

Ai = e"-«'[e"--''(i-i/") - 1] + ^e^-^^Ce""-//" - 1) 

= e"("'-^/")(e"-^ - 1) + (e""-^/" - 1).$ 
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(4.4) 



Thus, we get the overall expected change in ^ as follows: 

n 

E[$(t + 1) - Ht)\x{t)] = * A, 

1=1 

n 

= J2p^ * [e^^^'-^/^He"-^ - 1) + (e""-^/" - 1).$] 

i=l 
n 

= * e"('''--'^/")(e"-^ - 1) + (e""-^/" - 

i=l 

n 

= ^[p, * e-'°'-f/"{e°'-f - 1) + (e-"-^/" - l)].e"-"' 

i=l 

Now, e'"" -''/"-' * (e" -'^ — 1) can be approximated as follows: 

e(-"-//«).(e"-/ - 1) < (1 - a.//n + (a.f/nf) * (1 + a.f + {a.ff - 1) 

^ a.f + {ajy + Oiia.ff/n) 
e(-«-//").(e«-/ _ 1) < (q,./ + 

Above, since, (a. is very small for large n, we have ignored the small terms. Similarly, (e^" -'^/" — 1) ^ ~a.f/n 
Hence, the expected change in $ can be given by: 

n 

E[$(t + 1) - m\^{t)] <Y.^P^* (a-/ + (a-/)') - a.//"].e""- (4.5) 

Simplifying further and observing that $i decreases and pi increases with increasing i from 1 to n, one gets the 
following Corollary. 

Corollary 4.2 E[$(t + 1) - ^{t)\x{t)] < {a.ff * $/n 

Proof Since, are increasing and $i are decreasing, the maximum value taken by RHS of equation ( |4.5| l will be 
when Pi = 1/n for all i € [l..n]. Simplifying, we get the result. 

Similarly, the change in ^' can be derived as follows. 
Lemma 4.3 When an md-ball is thrown into an md-bin, the following inequality holds: 

n 

E[^it + 1) - ^{t)\x{t)] < Y,[P^ * (-a-/ + (a-/)') + a.//n].e-"-^- (4.6) 

i=l 

Further observing that pi > 0, one gets the following Corollary. 
Corollary 4.4 E[^'(i + 1) - *(t)|a;(i)] < {a.f.^)/n 



In the next two lemmas. Lemma |43] and Lemma 4.6 we consider a reasonably balanced md-bins scenario. We 
show that for such cases, the expected potential decreases. Specifically, for S(„^2) < 0, the expected value of $ 
decreases and for s^n-yi) > 0, the expected value of "if decreases. 

Lemma 4.5 Let <i> be defined as above. If S(^nr2){'t) < then, E[$(i + l)|a::(i)] < (1 - |^)$ + 1 
Proof From equation (|4.5|), we get. 



E[$(t + 1) - < Y.(p^ * + - "//")-^ 



< J2(P^* + - + E * + («/)')-e" (4.7) 



'j<n72 
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The last inequality follows since a./ < 1/2 and ^^^^^^^^j < 1. Now, we need to upper bound the term ^^^^^^ (pi* 
{a.f + (a./)^).e" '*'). Since pi is non-decreasing and $i is non-increasing, the maximum value is achieved when 
$i = ($/(?T,72)) for each i < 7172. Hence, the maximum value is: {a.f + (q;./)^)(72 — e)$/('T-72)- Thus, the 
expected change in $ can be computed, using equation (|4.7[) and the above bound, as follows: 



E[$(i + 1) - $(t)|x(i)] < {a.f + (a./)')(72 - e)$/(n72) - a.f/n * $ + 1 

< (a./)2.$/n - (a/e$/n72) + 1 

< e2.$/4n - e2$/(2n72) + 1 "^^.S) 



-e2 
< —$ + 1 
An 

Lemma 4.6 Let * Zje defined as above. If S(^„-yi){t) > f/jen, E[*(t + l)|a;(t)] < (1 - |^)^' + 1 

Proof The proof is similar to that of Lemma [43] See Appendix [H| for details of the proof. 

Now, we consider the remaining cases and show that in case the load across the bins , at time t, is not reasonably 
balanced, then for s„^2 > 0' either ^I^ dominates $ or the potential function is 0{n). 

Lemma 4.7 Let, S(„^2) > and E[A$|x(i)] > — e^<i>/4?i. Then, either $ < £71 * or F < cn for some 
c — poly{l / epsilon). 



Proof From equation ( |4.5| ), we get: 

n 

E[A$|x(i)] < ^(p. * {a.f + {a.ff) ~ a.//n).e"-^' 

i=l 

< Y.(P^* ("■/ + («■/)') - a-f/n).e"-'' + ^ (p. * {a.f + {a.ff) - a.f/n) * e"'^' 

i<"73 i>n73 (4.9) 

< [((1 - 9e)/n) * {a.f + {a.f)^) - a.//n)].$(<„^3) + {a.f)\<f 

(>n73)) 

/n 

< [-eV(2r^7i) + eV4n]-*(<«73) + e'-*(>«73)/4^i 

< [-eV(2n7i) + eVH-* + e^0^i>n^,)/2n ■.■ ^71 = 1 

Now, since E[A(l>|a;(i)] > -e^^/in, we get: $ < $(>„-y3)/72. Let, B = ^ - 7nax{0, Si). Note, = 0, since for 

each dimension d, the update maintains that, J2d=i ^fi^) — 0- One can observe that, <i>(>n73) < nj4 * e"'^/*-"'''^-', 
since 73 + 74 = 1. This imphes that, $ < ((n74)/(72)) * e"-^/^"''^). 

Since, S(„72) > 0' so, ^ > 7171 * ga S/(n7i) If $ < ,|c v[/^ then we are done. Else, $ > £71 * ^P. This implies: 

(^74/72) * e"-^/("'^-^) > $ > * 5- > £.7172 * e"-^/("'^i' 

Thus, e"--^/" < (^)<^?=^. So, F < ((1 + 6i)/e) *$<((! + 6l)/e) * (^74/72) * e"-^/^"'^^) < ((1 + 0)/^) ^ 
(7^74/72) * {~^^) <^3-T-i) . Hence, F < cn, where, c = poly{l/e). 

In the Lemma below, we consider the case where the load across the bins at time, t, is not reasonably balanced, 
and S(„7j) < 0. Here, we show that either $ dominates ^ or the potential function is 0{n). 

Lemma 4.8 Let, S(„-y^) < and E[A'ii\x{t)] > —e'^^/in. Then, either ^I^ < £71 * $, or F < cn for some c = 
poly{l / epsilon). 

Proof The proof is similar to that of Lemma 4.7 See Appendix|l]for details of the proof. 

Now, we consider combinations of the cases considered so far and can show that the potential function, F, behaves as 
a super-martingale. 

Theorem 4.9 For the potential function, T, E[F(t + < (1 - \n(i+t'^^) )^{i) + c, for constant c = poly{l/e). 

Proof We consider the following cases on intervals of values for Si. 
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• Case 1: Sj-n^^-) > and S(„^2) — ^- Using, Lemma 4.5 and Lemma 4.6 we can immediately see that, 
E[r(i + l)|a;(i)] < (1 — e^/4n)r(t) + c, for constant c = poLy{\/e) and hence, the result is also true. 

• Case 2: s„^j > s„^2 > 0- This represents a high load imbalance across the bins. In some cases, $ may grow 
but the asymmetry in the load implies that F is dominated by '3/. Thus, the decrease in offsets the increase in 
$ and hence the expected change in T is negative. 



Specifically, if E[A$ I x] < -e^/4n* then using Lemma|4.6|we get that E[r(t + < [l-e^ / An)T{t)^ 

c; else we consider the following two cases: 



- Case 2a: $ < £71 * ^. Here, using Lemma [4^6] and Corollary 4.2 we get: 



E[Ar|a;] = E[A$|a;] + E[A*|x] 
n An 

<-(i-e7i)*^.* + i <-jj— ^r + 1 

An 4n(l + £71) 

- Case 2b: F < cn. Here, using Corollary |4.4| and Corollary |4.2| we get: 

E[AF|x] < a.//n*F < ca.j 
But, c- ((e2(l - e7i)/4n) * F) > c(l - £^(1 - e7i)/4) > c(l - e/2) > ca.j . Hence, the resuh follows. 



• Case 3: s^^^ < Sn-y, < 0. Here, if E[A^'|x] < then using Lemma 4.5 we get that E[F(t + < 

(1 — e^/4n)r(t) + c; else we consider the following two cases: 



- Case 3a: 5* < £71 * $. Here, using Lemma [43] and Corollary 4.4 we get: 

E[AF|a;] = E[A$|a;] + E[A*|.t] 

< -(e^/4n)$ + a.f/n * * + 1 

< ~{e^/An).<^> + {"fie^/2n)<i> 

< ^$ + 1 < - — ^ ^*F + 1 

- 2n - (4n(l + e7i)) 

- Case 3b: F < cn. Here, using Corollary |4.2| and Corollary |4.4| we get: 

E[Ar|x] = * F <ca.f 

Hence, this case follows similarly as Case 2b above. 
Now, we can prove using induction that the expected value of F remains bounded. 



Theorem 4.10 For any time t > 0, E[F(i)] < 



4c(l+e7i) 
£^(1-271) 



n 



Proof Using induction we can prove this claim. For < = 0, it is trivially true since r(0) — 2n. Using Theorem 4.9 
we get: 

E[Tit+l)]^E[E[Tit + l)\Tit)]] 
4c(l + e7i) 

Now, we can upper bound the gap across all the D dimensions across all the n md-bins. This gap is defined as follows: 

Gap{t) = umx[maxXi] (4.10) 
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Theorem 4.11 Fixed f Case: Using the bias (pwy^ < (1 — 9e)/n and pn-y^ > (1 + in the probability 

distribution in favor of lightly loaded md-bins, and assuming that f dimensions are exactly populated in each md- 
ball with uniform distribution of f dimensions over D, then the expected and probabilistic upper bound on the gap 
(maximum dimensional gap) across the multidimensional bins is given as follows. Let, S — g2(;[^27i) ' 

E[Gap{t)] < 21og(n)/e + 2/log(,5)/e 
Pr[Gap{t) > 41og(n)/e + 41og(^)/e] < l/n 

Pro of Le t, a be the winning md-bin and m be the winning dimension that represents Gap{t). Now, from Theo- 
we get, E[e" ''"] < nS. So, E[e"'''^° '^^d^m ^a)] < Lg^^ denote the gap as measured by the number 
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rem 

of md-balls in bin a minus the average number of balls across the bins. Then, 

E[sa] <l/a* log(n) + l/a * \og{S) 
^E[sa]<2f login) /e + 0{2f\og{6)/e) 
f.E[y,] < 2/ log(n)/e + 0(2/ log(5)/e) 
^ E[ya] < 2 log(n)/e + 0(2 \og{S)/e) 

The third inequality uses the fact that each ball has exactly / populated dimensions. Since the / dimensions are chosen 
uniformly and randomly from D dimensions, the expected gap in any di mens ion (and hence the winning dimension 
with the maximum gap) is bounded by 0( '°sj") )^ since e = 0(/3) (section ' 



i4.1 

Now, the Pr[sa > 4/log(n)/e + 2//e * log{6)] < Pr[T{t) > nE[ht) 
where Sa = J2d=i -^a- Hence, Pr[ya > 41og(7i)/e + 41og((5)/e] < l/n. 



< l/n (using Markov's Inequality); 



Theorem 4.12 Variable f Case: Using the bias (pn-y^ < (1 — 6e)/n and Pnj^ > (1 + d()/n) in the probability 
distribution in favor of lightly loaded md-bins, and assuming that each dimension is chosen as 1 with probability q 
(non-fixed f case); the expected and probabilistic upper bound on the gap (maximum dimensional gap) across the 
multidimensional bins is given as follows. Let, 5 — ^^^jz^^, then: 

E[Gap{t)] < 2\og{n)/e + m/n{l-q)+2log{d)/e 
Pr[Gap{t) >41og(n)/e + TO/7i(l-9)+41og((5)/e] < l/n 

Proof Since, each dimension is assigned 1 with probability q, the average number of ones per md-ball is /* = Dq. 
Let, a be the winning md-bin and m be the winning dimension that represents Gap{t). The number of ones in any ball. 



/, follows a Binomial(_D, q) distribution and has finite second moment. Using the analysis similar as for Theorem 4. 10 
we can get (proof omitted for brevity), E[e"''°] < n6, where a.f* < e/2. So, E[e"'^^"~''^<i?*™ ''°^] < n6. Takings 
logarithm of both sides, we get: 



E[x:^] + J2 E[xt] < log{n)/a + log{S)/c 



mq ^ 
n 



E[C] + J2 ^[^a] - (^ * ^) < log(n)/a + \og{6)/a 



(4.12) 



In the second inequality, represents the load in dimension d for bin a. Further, since the average load in each 
dimension is If k balls were thrown in bin a, then E[U^] — kq and similarly, J2d^7n ^i^al = ^ l)kq. Hence, 
we get; 

Dkq < 2/*.log(n)/e + 2/*log((5)/e + TO/7n 
^ k <2 log(n) /e + 2 \og{d) /e + m/n 
^ E[x'//] < 2 \og{n)/e + 2 \og{S)/e + m/n(l - q) 
The probabilistic bound can be computed similar to the fixed / case (Theorem 4.1 1[ ). □ 



4.3 Lower Bound 

We can show that the upper bound, for fixed / case with uniform distribution, proved in section [4!2] is tight to within 
f /D factor Consider the case when, anlog(n)//3^ balls are thrown into n bins, using the (1 + /3) choice process. 
The expected dimensional sum load per bin is af\og{n)/fP. Now, the expected number of balls thrown using the 
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(1 + /3) choice process is an{l — j3) log(7i)//3^. Raab and Steger lITTl show that when cn log(n) balls are thrown 
uniformly and randomly into n bins, then the load of the most loaded bin is at least (c + y^/lO) log(n) balls. Using, 
c — a{l — P) / P'^ , one can see that sum load in the maximum sum load bin is at least; 



/32 



+ 



100/32 



) * /log(n) 




10/3 



— a 



) * /log(n) 



Since, each ball has / populated dimensions, hence, there are at least 0(log(ri)//3 + a\og{n) / ji"^) balls in this max 
sum load bin. Since, in each ball / dimensions are uniformly distributed over D dimensions, there exists a dimension 
whose load is at least 0{f \og{n)/Df3) more than the average. Hence, the lower bound is 0{f log(n) /D/3). 

5 Conclusions & Future Work 

In this paper, we consider the challenging problem of multidimensional balanced allocation for both the sequential 
and the parallel d choice process and show that the gap (assuming fixed / populated dimensions per ball and uniform 
distribution of / over D) is 0(log log(n)), which is tight (within D/ f factor of the lower bound). This improves the 
best prior |j6l bound of 0(loglog(n£))). Further, for arbitrary number of balls m >> n, the expected gap also has 
upper bound of 0(log log(n)), that is independent of m for the fixed / case with uniform distribution of populated 
dimensions. For the variable / case with (non-uniform) binomial distribution of populated dimensions, the gap is 
0(log(n)) for TO = 0{n). To the best of our knowledge, this is the first such analysis for d-choice paradigm with 
multidimensional balls and bins. 

Our analysis also provides a much easier and elegant proof technique (as compared to |4|) for the 0(loglog(n)) 
gap for m » n scalar balls thrown into n bins using the symmetric multiple choice process. Moreover, for the 
weighted sequential scalar balls and bins and general case to >> n, we show the upper bound on the expected gap 
as 0(log(n)) which improves upon the best prior bound of n'^ (c depends on the weight distribution that has finite 
fourth moment) provided in 1 12 J.In future, we would like to generalize the potential function approach for parallel and 
weighted balls and bins. 

Further, we consider the challenging problem of multidimensional balanced allocation for the (1 + /3) choice 
process and show that for arbitrarily large number of balls, the expected gap (assuming fixed / populated dimensions 
per ball and uniform distribution of / over D) is 0( ^°s^") y which is tight (within D / f factor of the lower bound) 
and also independent of to. Further, the expected gap is also independent of to for non-uniform distribution of / 
dimensions over D, with fixed / per ball) and for random / with Binomial distribution. 
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Figure 1: Multidimensional Balls and Bins Scenario 



B Proof of Lemma 1X3 

The Lemma is restated below. 

Lemma B.l When an md-ball is thrown into an md-bin, the following inequality holds: 



E[*(i + 1) - ^{t)\x{t)] < Y,[P^ * (-a-/ + ("■/)') + a-/Ai]*. 



(B.l) 
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Proof Let be the expected change in ^ if the ball is put in bin, i. So, ri{t + 1) = Si + /(I — ^/n); and for j ^ i, 
rj{t + l) — Sj{t) — f /n. The new values i.e. s{t+l) are obtained by sorting r(t+ 1) and5'(s) = ^'(r). The expected 
contribution of bin, i, to is given as follows: 

-a.(s,+/(l-l/n)) p-a.Si „-a.Si 

E[ ] - =. [e-"./(i-i/") _ 1] 

Similarly, the expected contribution of bin, j {j ^ i) to A,; is given as: 

-Q.(sj-//n) p-a.Sj p-a.Sj 

m- ] - = [e"-//" - 1] 

3 j 3 

Therefore, A^ is given as follows: 

A, = ^'.[e-"-^^!-!/") - 1] + XI *j(e"-^^" - 1) 

= ^,e"f/"{e-"-f - 1) + (e"-^/" - 1).* 
Thus, we get the overall expected change in 5* as follows: 

n 

n 

= * [4',e"^/"(e-"-^ - 1) + (e""-^/" - 1).*] 

= * ^',e"^/"(e-"-^ - 1) + (e-"--^/" - 

1=1 

n 

= Y1[P^ * e"-//"(e-"-^ - 1) + (e"-^/" - 1)]*, 

i=l 

Now, e" (e^" -^ — 1) can be approximated as follows: 

ga.//«)^(g-a./ _ < _^ ^ j/^j _^ {a.f/nf ) * (1 - a.f + (a.ff ~ 1) 
^-a./ + (a./)2 + 0((a./)Vn) 

Since, (a./)V n is very small for large n, we ignore these small terms. Hence, 

e"-^/").(e-"-^-l)^ (-a./ + («./)') 

Similarly, (e*^ -^ — 1) ^ a.f /n Hence, the expected change in ^ can be given by: 

n 

E[*(i + 1) - < Y}P^ * (-«•/ + («•/)') + (B.3) 

i=l 

C Proof of Lemma 13.61 

Lemma C.l Let * be defined as above. If S(^n-,i)it) > then, E[^{t + l)\x(t)] < (1 - 1^)* 
Proof From equation ( |B.3[ ), we get. 



E[*(i + 1) - ^{t)\x{t)] < J2iP^ * (-«■/ + («■/)') + a-f/n).-^^ 

1=1 



(C.l) 



The last inequality follows since the {—a.f + (a./)^) is negative. Now, we need to upper bound the term J2i>nn Pi * 
{—a.f + (a./)^).^i. Since, {—a.f + {a.f)"^) is negative, we need to find the minimum value of X]i>n7i Pi * ^i- 
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Further, Xn-yi{t) > 0, so J2i>nfi > ^ ^ 111(7171). Since Pi is non-decreasing and e " is non-decreasing, the 
minimum value of J2i>rfyi Pi * achieved, when, e~"*' ln(l/7i) = ('5 — ln(n7i)) for each i > n-fi. Thus, the 
minimum value is given as foUows. 

^ ^ - ln(n7i) 2i - 1 1 

^ M/-ln(n7i) ^ 2(l-7i) ln(l/7i) ^ 
~ ln(l/7i) n rfl 

Thus, the expected change in can be computed, using equation \C\\ and the above bound, as follows: 



< 

- 8n 



ln(l/7i) n V? n 

f(l-7i) 2 
(l/7i) 



^ -2a/v^(i -7i) _ ^"■/(i-;;)^"^^) + g^Zj: + Q((a./)^) + o(i/n^) (c.s) 

nln(l/7i) ln(l/7i) n 



D Proof of Lemma 13.81 

Proof From equation (|B.3[), we get: 



i>(n74) «<("74) 



' '^-^ >(n74) 

a/^(-2(l-74)+ln(l74)) -74 ln(l/74) + (1 - 74) ln(n74) 

nln(l/74) +1^«./*<(„T,4)J nln(n74)ln(l/74) 

In the above, the third inequality follows since, (—a./ + (a./)^) is negative andp; > . No w, since E[A4'|x(i)] > 
^1^, we get that, ^E* < 4\1'<(„^^). Let, B = max(0, s^) (as mentioned in Lemma 3.7 1. One can observe that, 
*<(„74) < ln(n74) * e"-^/("'^-''. This implies that, < 41n(n74)e" -^/("'^3), 

Since, s„^j < 0, so, $ > ln(n7i) * e"'^/'"^^^. If < £71 * $, then we are done. Else, 5* > e7i * $. This implies: 

41n(n74)e"-^/("'^^) > > e7i * $ > (£7^ ln(n7i)) * e"-^/("^i) 

Thus, e"--^/" < (^rTH^^)"^- '^O'T ^ + 6')/e) * * < ((e + 6i)/e) *41n(n74)e"-^/("^-'). Hence, T < cln(n), 
where, c = poly{l/e). 



E Proof of Lemma 13.17 

The Lemma is restated below. 

Lemma E.l When an md-ball is thrown into an md-bin, the following inequality holds: 

E[*(t + 1) - ^{t)\x{t)] < Y,[P^ * (-"•/* + ^) + "-/V"]*. (E.l) 



26 



Proof Let be the expected change in \f if the ball is put in bin, i. So, ri{t + 1) = s,; + /(I — 1/n); and for j ^ i, 
rj{t + 1) = Sj{t) — f /n. The new values i.e. s{t + 1) are obtained by sorting r{t + 1) and ^'(s) = ^'(r). When, 
an md-ball is committed to bin i, then it jumps to an index inew which is less than or equal to i in the new bin order. 
Using similar analysis for taking care of these jumps as in Lemma [3.15[ the expected contribution of bin, i, to is 
given as follows: 

-a.(6i+/(l-l/n)) -a. Si 

n^ + n — i + 1 n^+n — i + l 

-[Af(-a(l-l/7i))-l] 



+ n — i + 1 ' 
<*,(~a.r(l-l/n) + ^) 



Similarly, the expected contribution of bin, j {j ^ i) to A^ is given as: 

-Q.(sj-//n) p-a.Sj 



+ n — 7 + 1 ri^ + n — 7 + 1 



+ n — 7 + 1 
-a./* 



[Af(a/ri) - 1] 



Therefore, A^ is given as follows: 

A, = 

Hence, the expected change in can be given by: 



—^ + — 

n 



" c 2 

E[^{t + 1) - vl'(i)|x(t)] < ^b,; * (-a.r + ^) + «•/* H*.: (E.2) 

1=1 



F Proof of Lemma 3.20 



Lemma F.l Let ^ be defined as above. If S(^rin){t) > then, E[^'(t + l)|x(t)] < (1 - 
Proof From equation ( |E.l| l, we get, 

EMt + 1) - *(t)|a;(i)] < * (-a./* + ^) + a./7«)-*^ 



(F.l) 



i>n7i 

ipdativp Nnw wp npprl tn nnnsr hniinrl the term V ^ 

./2>n7i - 



The last inequality follows since the {—a.f* + is negative. Now, we need to upper bound the term X]i>n7i Pi * 
{—a.f* + Since, {—a.f* + is negative, we need to find the minimum value of X]i>n7iK * ^i- 

Since Pi is non-decreasing and is non-decreasing, the minimum value of J2i>n-ii Pi * achieved, when, 

g-QSi """■T'l = for each i > nji. Thus, the minimum value is given as follows. 



1^ Pi (1 _^^) n2 * n^ + n-i + 1 

i>n7i 'i>n7i 

^ ^ ^ (2n + l)(l-7i) 
~ n(l — 7i) n 

Thus, the expected change in ^ can be computed, using equation dFTb and the above bound, as follows: 



(F.2) 



n(l — 71) n n 

^ -a/** 2S'a2<If 

~ n n? 11? 

-a.f*^ 

< 

2n 
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G Proof of Lemma 



3.22 



The lemma is restated below: 

Lemma G.l Let, S(„^j) < Q and¥.[IS.^\x{t)] > -a.f*'^/2n. Then, either'^ < £71 * $, or T < c/n for some 
c — poly{l / epsilon). 



Proof From equation ( |E.l| i, we get: 

" o 2 J"* 



^ — ' ^ — ' 

«>("74) ■i<(n74) 

a./*^'(>„^,) (2n+l)(74-l) , a./* 
7174 n n 

a-/**«n74) (271+1)74 

* 

n(74 — 1) n 

aj-vl/ (277 + l)(74-l) , a.f* 

<. * 1 

7774 77 n 

2a./**«„^,) (I-274) 
'^■74(74-1) (I-74) 

(G.l) 

In the above, the third inequality follows since, {—a.f* + ^^) is negative and pi > 0. Now, since E[A4'|a;(<)] > 

. One can 
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we get: * < *<(„t,4) * ^^^SyT)(4^il-i) - ^ = E» max{0, Si) (as mentioned in Lemma 
observe that, *<(„7,) < *i=^e"-^/("-"'^''). This implies that, ^ < ^^^^^^.^i) ^a.B / (n^n-,^) ^ 

Since, s„^i < 0, so, <& > ^ * e°'-^^'-"'^^K If vl* < e7i * $, then we are done. Else, > eji * This impUes: 

4(274 ~ 1) > ^ > ^ $ > ^ gai3/(n7l) 

74(474 - 1) 77 

Thus, e-^/" < (-lijiz^)?^. So, r < ((1 + 9)/e) + e)/e) * ,lgi=ii^e-^/("^^). Hence, 

F < ^, where, c = poly{l/e). 



H Proof of Lemma 4.6 



Lemma H.l Lef * be defined as above. //'s(„^j)(t) > then, E['^{t + l)\x{t)] < (1 - |;^)* + 1 
Proof From equation (|4.6[), we get. 



E[^{t + 1) - *(i)|a;(t)] < ^(p, * (-a./ + (a./)^) + a.//77).e-"-^' 

1=1 

< ^ p,*{-a.f + {a.ff).e-"-'^+a.f/n*^ 

2>n7i 



(H.l) 



The last inequality follows since the {—a.f + {a.fY) is negative. Now, we need to upper bound the term X]j>r!7i Pi * 

2\ ^-a.Si c;„„„ / ft/'^, 7•^2^ ;„ nooH fir>H the t^iir.; rr„,rr, -iiol.io ^ 



(— a./+(a./) ).e Since, (— a. /+(q!./)~) is negative, we need to find the minimum valueof^^^^^^^ Pi 
Further, Xn-^^ (t) > 0, so J2i>nyi 4'; > ^' — 7771. Since pi is non-decreasing and '^i is non-decreasing, the minimum 
value of X]i>n7i ^'j * e^" "*' is achieved, when, = (5* — 7i7i)/7i72 for each i > 7771. Using the assumption, that 
J2i>nii ^ (72 + e), the minimum value is: (72 + e)(* - 7771)7(7772). Thus, 

J2 P^ * (~a-/ + («./)').e"""' < {-a.f + {a.ff) * (72 + e) * {^ - nji)/{nj2) (H.2) 

'i>n7i 
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Thus, the expected change in can be computed, using equation ( |H.l| l and the above bound, as follows: 

E[*(t + 1) - ^it)\xit)] < i-a.f + (a./)2)(72 + e)(* - ^71)7(7172) + a.f/n * * 
< 



£2 £2 

— H + 1 

2n72 4n 



(H.3) 



< — ^- + 1 
An 



I Proof of Lemma 4.8 



Proof From equation ( |4.6| ), we get; 

n 

E[A^\x{t)] < J2iPr * {~a-f + («■/)') + a.//n).e-"-^' 

i=l 

< ^ (p,*(-a./ + (a./)2)+a.//n).e-"-^'+ ^ (p, * (-a./ + (a./f) + a.//n) * 

i>(n74) i<(n74) 

< [((1 + ee)/n) * i~a.f + (a.ff) + a.//n)].*>(„^,) + (a./).vl'<„^jn 



< [-a/6leVn + eV4n].*>(„^^) +a./.$<(„^,)/r 



< 



(-eV(2n7i) + e^/{An)) * * + (a.//n + eV(2n7i)).vI/<(„^^) 



(I.l) 



In the above, the third inequality follows since, {—a.f + (a./)^) is negative and pi > 0; and Vi > (7274), pi > 
(1 + ee)/n. Now, since E[A^'|a::(i)] > —e^"^ /An, we get that, / i'^nji) < (e/2n + e2/(2n7i)) * ^<(„^^). 

Thus, we get: < ^"^^'t^^^-' * 4'<(„^^). Let, S = max{0, Si) (as mentioned in Lemma 4.7 1. One can observe that. 



*<(n74) < (n-fi) * e"-^/("''3). This implies that, * < i^jt^ * (^74) * e"-^/("'^='). 

Since, s„^j < 0, so, $ > 7171 * ^ If < e^i * <i>, then we are done. Else, 5* > 671 * $. This implies: 

((n(l + 6le)74)/(e72)).e"-^/("'^-^) > ^- > e^i * $ > (e^s^) * e"-^/^"'^!) 

Thus, e"-«/" < (ii±^)^i^. So, r < ((1 + 9)/e) **<((! + e)/e) * (i+g^)74» _ga.B/(»73). Hence, T < cn, 
where, c = poly{l/e). 
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